Advanced Structures

It is possible to build an entire application in Oracle using only tables. Granted - it would not be very efficient unless the tables were very small. For this reason, most applications will include one or more indexes on each table. Occasionally an application will make use of Views, but usually only to centralise the code for a common table join or aggregation.

Oracle provides a number of other structures that can be used make the database more efficient and improve performance.

Partitioned Tables
Bitmap Indexes
Bitmap Join Indexes
Clusters
Index Organized Tables
Externally Organized Tables
Global Temporary Tables
Materialized Views
Partition Views
Pipeline / Table Functions
Parameterised Views

Partitioned tables

A partitioned table is a regular table that is broken up into chunks (partitions) that can be individually managed. Rows are distributed amongst the partitions based on the value of one or more columns - the partition key. There are three types of partitions:

List Partitions: Each partition contains rows for one or more distinct values of the partition key. For example, a list partitioned table on the STATUS column might have a separate partition for each value of STATUS - 'OPEN', 'CLOSED', 'CANCELLED', 'PENDING'.
Range Partitions: Each partition contains a contiguous range of values of the partition key. For example, a range partitioned table on the TXN_DATE column might break the table into monthly partitions - 'Jan 2003', 'Fed 2003', etc...
Hash Partitions: Each partition contains rows whose partition key hashes to a given value for the partition. Hash partitioning can break a table up into equal size partitions, even when the partition key does not have equal distribution. Oracle generates its own hash algorithm based on the number of partitions desired. For example, for a hash partitioned table with 20 partitions, Oracle will generate a hash algorithm that returns a number from 1 to 20.

When a SQL contains a WHERE clause on the partition key, eg. WHERE txn_date BETWEEN :b1 and :b1+10, that is not very selective (ie. returns lots of rows), then an index scan could be very inefficient. However a Full Table Scan on a very large table may be even more inefficient. By partitioning the table, Oracle may only need to Full Scan on one or two partitions depending on the WHERE caluse.

Although partitioning can be a developers tool for improving performance, it is even more practical for DBAs who need to reorganise and archive large tables - partitioning allows them to take a parallel or piecemeal approach.

Partitioning should really only be used on large tables (>50,000 rows), and only with the cooperation of the DBA.

Bitmap Indexes

Normal Oracle indexes are known as b-tree indexes. b-tree indexes are virtually useless on low-cardinality columns (columns with only a few distinct values). By the time 4% or more of a table is read via an index, it would be faster to perform a full table scan. Bitmap Indexes do not solve this problem in general, but they do solve a specific case. If a table contains 2 or more low-cardinality columns and they are used in combination in a SQL statement to reduce the number of rows returned, then bitmap indexes can perform faster than a full table scan.

Use bitmap indexes when:

A table has 2 or more low cardinality columns. Create a bitmap index on each column
Two or more bitmap indexed columns are used together in AND and OR conjunctions, using equals (=), IN, and NOT IN conditions.
The WHERE clauses on the bitmap indexed columns, when combined, reduce the rows returned to a small percentage of the table or partition.
The bitmap indexed table has low volatility (does not change much), or all updates are performed by a single session. Bitmap indexes on high volatility tables can cause locking problems.

Bitmap Join Indexes

This section not yet developed.

Clusters

The smallest unit of data that Oracle can read from disk is the block. Depending on the database, a block is usually 8K or 16K, but can be smaller (especially pre-v8 databases) or larger still. Most tables can store many rows in a single block; obviously this will depend on the number, size and content of columns, but 3-10 rows per kilobyte is not uncommon - ie. 24-80 rows per 8K block. So, whenever a single row is read, a futher 40 or more rows will be read at the same time; these rows are frequently discarded. Clusters help us exploit this feature.

In some tables, we will often read collections of rows together. If the rows are stored in a cluster, then we can arrange it so that all rows with the same value of a particular column or columns (the Cluster Key) are stored together in the same block. For example, if 20 rows with the same Cluster Key are accessed from a cluster, they will be spread across only one or two blocks. Without a cluster - worst case scenario - they will be spread across 20 blocks. This worst case is often the likely case unless the rows were all inserted at the same time.

Important: Clusters require skill to set up correctly. They are more difficult to manage, and take up more space than a regular table. Always get approval and assistance from the DBA and/or Database Architect before creating a Cluster.

There are two different types of clusters in Oracle

Index Clusters

In an Index Cluster, rows with the same cluster key are stored together in the same block (where possible), and an accompanying index stores pointers to the cluster blocks for each cluster key. Access speed on an index cluster is the same as that of a table for single row access, but is much faster when multiple rows are processed with the same cluster key.

When a row is inserted into the table, Oracle will check whether any existing rows have the same cluster key. If so, the new row will be inserted into the same block if there is room. Otherwise a new block will be allocated. Oracle will not use any old block like it would for a non-clustered (heap organised) table.

Hash Clusters

In a Hash Cluster, rows with the same cluster key are stored together in the same block (where possible), but an index is not necessary. To access a Hash Cluster, Oracle hashes the cluster key, and uses this hash key to identify a block. If set up correctly, this can be much faster than using an index. Hash clusters are preferred to indexed clusters for this reason for queries with equals (=) or IN list predicates. A hash cluster cannot process range predicates (<, >, LIKE, BETWEEN), so if these are common (or even likely) then an index cluster will be better.

Unlike an Index Cluster, Oracle must reserve space for a Hash Cluster up front. If a hash cluster is declared too large, you can get problems with Full Table Scans. If it is too small, then not all rows with cluster keys that hash to the same value will fit in a single block, so rows will overflow into chained blocks. Once a hash key is chained, any access on that hash key must read 2 (or more) blocks instead of just one. A chained hash table should be re-organised by the DBA until the USER_CLUSTERS.AVE_BLOCKS_PER_KEY statistic on the cluster is 1.

When set up correctly, hash clusters can provide super fast access to individual rows where the cluster key is referenced with equals predicates, because it can go straight to the block containing the matching rows without using an index. The trade off is space, maintenance, and slower full table scans.

A cluster can store the rows for a single table, or it can be configured to cluster rows with the same key for two or more tables.

Single table clusters

Single table clusters have two excellent uses:

By hash clustering on the primary key (or an alternate Unique key), you can get super fast single row access by avoiding an index. This is almost as good as accessing by rowid directly. This technique is ideal for medium-large reference tables where the data is reasonably static - that way you don't have to cater for growth in the cluster. Note: Do not do this with index clusters, as there is no point - you have to use the index anyway.

The same technique also yields impressive results when joining a large (>250,000 rows) clustered table to other large tables. Typically, Oracle would perform a hash join for such an operation requiring one of the tables be converted into an in-memory hash-table. If both tables are large then this hashing operation must be performed in chunks, which affects the performance of the join. If one of the tables is hash-clustered on its primary key, then Oracle does not need to build an in-memory hash table - one already exists on disk! The join method used is actually a Nested-Loops join (without any of the performance problems when using high-volume NL with indexes), but the performance achieved is more comparable with a hash join.
Tables with 5-200 rows per cluster key, where the rows are typically accessed together by the cluster key will benefit from clustering because they will all reside in the same block. Use a hash cluster if access is always using equals predicates on the entire cluster key. Use an index cluster if accessing on part of a concatenated cluster key, or using range (<, >, BETWEEN, LIKE) predicates.

Old mainframe systems that are upgraded from ISAM / VSAM to Relational databases are often classic cases for clustering (these systems are bread-and-butter for IBM's DB2 database, which clusters virtually every table). The nature of the old ISAM/VSAM databases is that rows are very rarely updated, instead the old row remains untouched and a new row is written to replace it. Each row is tagged with a timestamp and a sequence number so that you can always tell which is the latest. In a relational database, this proves a massive headache, since we must read every row for a key in order to determine which is the most recent. If these rows are all stored in separate blocks, then access speed deteriorates markedly with every new row. Clustering these tables provides a huge advantage, since all rows for the same key are stored in a single block. Furthermore, Oracle will reserve space in each block for the addition of new rows with the same cluster key.

Multi table clusters

Multi-table clusters are used when two or more tables share the same cluster key, and the tables are frequently joined on the cluster key. A multi-table cluster allows Oracle to perform a Cluster Join; the most efficient join method available. For low volume joins (ie. pick up a few rows from Table A and look up matching rows in Table B), a cluster join is hardly warranted as an indexed Nested Loops join will perform perfectly well. Consider a mutli-table cluster when high volume (many rows from both tables) joins are more common.

For Cluster Joins, it doesn't matter whether you use an Indexed Cluster or a Hash Cluster. That is only important for initially selecting the rows from the first table, or for single table selects. This is discussed in Single Table Clusters above.

Use multi-table clusters when:

You have mutliple tables involved in frequent or high volume joins on the cluster key, and
The join key (cluster key) is selective on both tables (ie. < 100 rows per key))

Index Organised Tables

Regular tables in Oracle are otherwise known as Heap Organised. ie. They are not really organised at all; any row can reside in pretty much any block.

Sometimes a table will have no columns other than its primary key, and sometimes just a couple of columns other than the primary key. For tables such as these it seems a waste to create both the table and the primary key index as the same information is stored twice. An Index Organised Table allows you to avoid this situation. With an IOT, the index is the table.

An old technique used to avoid table accesses was to add non-key columns to an index. In this way Oracle could get all the data it requires from the index without looking up the table. IOTs allow us to do this even better because we don't compromise the uniqueness of our index. Of the non-key columns in an IOT, you can choose some or all of them to be stored in the same blocks as the index without affecting the uniqueness of the primary key. Other non-key columns are stored in an overflow segment. Queries that access the columns in the overflow segment may be slower because they have to read both the index blocks and the overflow blocks.

Use an IOT when:

Most table access is on the primary key, or a leading subset of the primary key.
The table contains a small number of short non-key columns that are frequently accessed with the primary key.
Full table scans are infrequent, or do not access non-key columns in the overflow segment.

Avoid IOTs with both Overflow Segments and non primary-key indexes. A non-pk index access could possibly read blocks from the non-pk index, the pk index, and the overflow segment. This would be very inefficient.

Externally Organised Tables

Introduced in Oracle v9, Externally organised tables allow SQL to process an external data file as if it were a table. The syntax for EOTs is based on the syntax for SQL*Loader. From v10g onwards, External Tables can use oracle Data Pump instead of SQL*Loader syntax.

The big advantage here is for Data Warehouses that typically load a file into a staging area, and then transform it using table joins and other SQL functions that could not be performed efficiently during the load. EOTs allow you to do both steps at once, saving on the cost of writing and reading the staging table.

Obviously EOTs cannot be indexed or partitioned, so should only be used where a Full Table Scan is desired.

Global Temporary Tables

Many complex processes - particularly those in data warehouses - use temporary tables. Prior to Global Temporary Tables, the application may have code embedded to CREATE TABLE temp_... AS SELECT ... or perhaps a table already existed into which the process inserted rows and deleted them afterwards.

Both of these processes suffer when multiple concurrent sessions require the same temporary table.

A Global Temporary Table solves this problem. A Global Temporary Table is like a regular table, except for the following:

It consumes space only in the Temporary tablespace (usually TEMP).
Rows inserted into a Global Temporary Table cannot be seen by other sessions.
Rows in a Global Temporary Table are automatically purged upon either COMMIT or at the end of a session, depending on how the GTT was defined upon creation.

Materialized Views

It is well known that Oracle views do not consume space; they simply offer an alternative way to access the underlying base tables. Materialized Views are views where the view query has been resolved into a table. ie. They do take up space. They can be configured to refresh upon commit of changes to the base tables, on demand, or periodically.

Depending on the nature of the query in the view, it is also possible for refreshes to act only on those rows that have been inserted, updated, or deleted since the last refresh.

One of the biggest features of Materialized Views is that when you submit a query against the base tables, and the query could be more efficiently resolved by the Materialized View, then Oracle can be made to automatically rewrite the query to use the MV.

This feature makes MVs valuable for aggregates. For instance, we can create daily, weekly, and monthly aggregate materialized views over a base table. When users perform aggregate queries on the base table, Oracle can go straight to the most efficient aggregate MV to resolve the query.

Partition Views

Partition Views enjoyed an all-too-brief popularity with Oracle v7 and were then largely ignored as redundant with the release of Partitioned Tables in Oracle v8. A Partition View is a view of the form:

CREATE VIEW view1 AS
SELECT *
FROM table1
WHERE ...
UNION ALL
SELECT *
FROM table2
WHERE ...
UNION ALL
...etc...

If all of the tables in the partition view have the same column names, data types, indexes, and the component queries of each UNION ALL reference only a single table using SELECT * without sub-queries or CONNECT-BY (phew!) then Oracle can treat SQLs on the view much as it would treat SQLs on the individual tables. For instance, WHERE clauses against the view can be reliably pushed into the WHERE clause of each UNION ALL component and Oracle will choose the best execution path for each component.

Clearly, if component tables of the view have the same columns, data types, and indexes, then it would be preferable in a v8+ database to use a Partitioned Table. This is certainly Oracle's position, because although (as of 10g release 2) they are still supported by the optimizer, they are now no longer described at all in the documentation. To read more about Partitioned Views, you will need a copy of the v7.3 online Oracle documentation - miraculously still available on www.oracle.com!

The other reason that partition views are now largely irrelevant is that the optimizer is becoming more and more adept at merging predicates on queries into the text of the view itself; having the same columns and indexes on each table in a UNION ALL view no longer confers an advantage - except for one case: STAR_TRANSFORMATION. Oracle cannot perform a STAR_TRANSFORMATION on a UNION ALL view that does not meet the criteria of a Partition View.

So, having established that there is at least one reason to meet the stringent Partition View requirements over and above any other UNION ALL view, Partitioned Tables are still easier; why would you not use a Partitioned Table?

Composite partitoned tables (sub-partitions within partitions) support only two types: Hash sub-partitions within Range partitions, and List sub-partitions within Range Partitions. Consider the case where you want to partition your data primarily by unapproved and approved transactions, and then you want to sub-partition the approved transactions by date-range. This (Range sub-partitions within List partitions) cannot be done with Partitioned Tables. However by creating a separate table for Unapproved and Approved transactions, each table can then be partitioned (or not partitioned) in a way that best suits the data.
In a data warehouse, if you want a Materialized View to aggregate statistics from two or more fact tables across common dimensions, then you have two options:
1. Aggregate the tables and then join them across the common dimensions.
2. Aggregate the tables and then concatenate them in a union-all.
Both options can be supported by Materialized Views but are restricted in their refresh method. Option 1 requires a full-outer-join between the aggregated fact tables, so fast-refresh is not possible at all. Option 2 requires a UNION ALL of the aggregated fact tables which is fast-refreshable only for DML (insert/update/delete), but not for DDL (split/merge/drop/truncate partitions) - also known as PCT fast refresh. However by creating one Materialized View aggregate for each fact table, and a Partition View over all of the materialized views, it is possible for the whole to be both fast-refreshable and PCT-fast-refreshable.

Pipeline / Table Functions

As a general rule, querying data from the database can be done more efficiently with pure SQL rather than PL/SQL. Of course, every rule has its exceptions; complex data retieval involving several steps, conditional logic, and/or exception handling, is often not only easier to write, read, and maintain in PL/SQL, but it can also run faster.

The problem with this approach is that only a small part of an overall requirement may be complex enough to warrant PL/SQL; is it possible to just write the complex part in PL/SQL, and then use the result set in pure SQL statements? As of v9i, the answer is Yes. Table Functions, and especially Pipelined Table Functions can be used to convert a Nested Table (populated by a custom PL/SQL function) into an in-memory table that can then be referenced in a SQL statement. When the SQL statement is executed, the PL/SQL function is called, rows are returned to the SQL engine, cast into a table, and then processed by the SQL statement.

In this way, it is possible to:

Use the /*+APPEND*/ hint to direct-path insert the output of a PL/SQL function into a table.
Join the output of a PL/SQL function to other tables, views, or even other table functions.

A normal table function that returns a Nested Table will run to completion, and return all rows in a single Nested Table. This technique is not really scalable, because of the memory required to store the Nested Table as it is being built; performance will suffer as the nested table starts paging to disk. By making the PL/SQL function Pipelined, rows are returned directly to the calling SQL statement rather than being stored in the Nested Table.

Parameterised Views

A Parameterized View is not an Oracle object, but a technique that is sometimes used to counteract problems with selecting from views.

This technique is a bit contentious, as it is usually possible to resolve the problem in a different way. For the sake of an example, consider the following:

A system contains tables CUSTOMER, ORDER, and ORDER_LINE. Say we want to give one of our customers SQL access to our database, but we don't want them to see data for our other customers. When they log in, we could set a package global varaible to their customer ID, and supply them with a series of views that use the package global variable to restrict what is seen. Say we want a view that shows orders for any given customer.

CREATE OR REPLACE VIEW v_order_line_cust AS
SELECT *
FROM   order_line
WHERE  order_id IN (
       SELECT order_id
       FROM   order
       WHERE  cust_id = param_view.get_cust_id()
       )

The sub-query uses a packaged function to return a Customer Id. The package contains the following declarations:

global_cust_id customer.cust_id%type;
procedure set_cust_id(cust_id customer.cust_id%type);
function  get_cust_id return customer.cust_id%type;

The procedure must be called prior to querying the view in order to set the global_cust_id variable. The function simply returns the value of global_cust_id.

Parameterised views also allow us to push predicates into views where it would otherwise be impossible; such as when the column in question is in a sub-query within the view, or when the view uses complex functionality such as CONNECT BY, ROWNUM, etc.